NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Unraveling Block Maxima Forecasting Models with Counterfactual Explanation

https://doi.org/10.1145/3637528.3671923

Deng, Yue; Galib, Asadullah Hill; Tan, Pang-Ning; Luo, Lifeng (August 2024, ACM)

Full Text Available
Abrupt changes in algal biomass of thousands of US lakes are related to climate and are more likely in low-disturbance watersheds

https://doi.org/10.1073/pnas.2416172122

Soranno, Patricia A; Hanly, Patrick J; Webster, Katherine E; Wagner, Tyler; McDonald, Andrew; Shuvo, Arnab; Schliep, Erin M; Reinl, Kaitlin L; McCullough, Ian M; Tan, Pang-Ning; et al (March 2025, Proceedings of the National Academy of Sciences)

Climate change is predicted to intensify lake algal blooms globally and result in regime shifts. However, observed increases in algal biomass do not consistently correlate with air temperature or precipitation, and evidence is lacking for a causal effect of climate or the nonlinear dynamics needed to demonstrate regime shifts. We modeled the causal effects of climate on annual lake chlorophyll (a measure of algal biomass) over 34 y for 24,452 lakes across broad ecoclimatic zones of the United States and evaluated the potential for regime shifts. We found that algal biomass was causally related to climate in 34% of lakes. In these cases, 71% exhibited abrupt but mostly temporary shifts as opposed to persistent changes, 13% had the potential for regime shifts. Climate was causally related to algal biomass in lakes experiencing all levels of human disturbance, but with different likelihood. Climate causality was most likely to be observed in lakes with minimal human disturbance and cooler summer temperatures that have increased over the 34 y studied. Climate causality was variable in lakes with low to moderate human disturbance, and least likely in lakes with high human disturbance, which may mask climate causality. Our results explain some of the previously observed heterogeneous climate responses of lake algal biomass globally and they can be used to predict future climate effects on lakes.
more » « less
Free, publicly-accessible full text available March 4, 2026
SimEXT: Self-supervised Representation Learning for Extreme Values in Time Series

https://doi.org/10.1109/ICDM58522.2023.00119

Galib, Asadullah Hill; Tan, Pang-Ning; Luo, Lifeng (December 2023, IEEE)

Full Text Available
Population Graph Cross-Network Node Classification for Autism Detection Across Sample Groups

https://doi.org/10.1109/ICDMW60847.2023.00050

Stephens, Anna; Santos, Francisco; Tan, Pang-Ning; Esfahanian, Abdol-Hossein (December 2023, IEEE International Conference on Data Mining Workshops)
Jihe Wang, Yi He (Ed.)
Graph neural networks (GNN) are a powerful tool for combining imaging and non-imaging medical information for node classification tasks. Cross-network node classification extends GNN techniques to account for domain drift, allowing for node classification on an unlabeled target network. In this paper we present OTGCN, a powerful, novel approach to cross-network node classification. This approach leans on concepts from graph convolutional networks to harness insights from graph data structures while simultaneously applying strategies rooted in optimal transport to correct for the domain drift that can occur between samples from different data collection sites. This blended approach provides a practical solution for scenarios with many distinct forms of data collected across different locations and equipment. We demonstrate the effectiveness of this approach at classifying Autism Spectrum Disorder subjects using a blend of imaging and non-imaging data.
more » « less
Full Text Available
Influence Propagation for Linear Threshold Model with Graph Neural Networks

https://doi.org/10.1109/ICDMW60847.2023.00149

Santos, Francisco; Stephens, Anna; Tan, Pang-Ning; Esfahanian, Abdol-Hossein (December 2023, IEEE)
Jihe Wang, Yi He (Ed.)
Influence propagation is a network phenomenon governing how information is diffused in a network. With the advent of deep learning, there has been growing interest in applying graph neural networks to extract salient feature representation of the nodes for a variety of network mining tasks, such as forecasting the virality of information cascade. Given the importance of social influence, this paper presents a novel deep learning framework called IP-GNN for simulating the information propagation process in a complex network and learning a node representation that embeds information about the diffusion process under the linear threshold model. Our framework employs a modified graph convolutional network architecture with adaptive diffusion kernel to capture long-range propagation of information along with an entropy-regularized mixture of loss functions to ensure accurate prediction and faster convergence of the learning algorithm. Experimental results on 4 real-world datasets show that the model accurately mimics the output of the linear threshold model, achieving an average accuracy that exceeds 90\% on all datasets.
more » « less
Full Text Available
Self-Recover: Forecasting Block Maxima in Time Series from Predictors with Disparate Temporal Coverage Using Self-Supervised Learning

https://doi.org/10.24963/ijcai.2023/414

Galib, Asadullah Hill; McDonald, Andrew; Tan, Pang-Ning; Luo, Lifeng (August 2023, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI 2023))

Forecasting the block maxima of a future time window is a challenging task due to the difficulty in inferring the tail distribution of a target variable. As the historical observations alone may not be sufficient to train robust models to predict the block maxima, domain-driven process models are often available in many scientific domains to supplement the observation data and improve the forecast accuracy. Unfortunately, coupling the historical observations with process model outputs is a challenge due to their disparate temporal coverage. This paper presents Self-Recover, a deep learning framework to predict the block maxima of a time window by employing self-supervised learning to address the varying temporal data coverage problem. Specifically Self-Recover uses a combination of contrastive and generative self-supervised learning schemes along with a denoising autoencoder to impute the missing values. The framework also combines representations of the historical observations with process model outputs via a residual learning approach and learns the generalized extreme value (GEV) distribution characterizing the block maxima values. This enables the framework to reliably estimate the block maxima of each time window along with its confidence interval. Extensive experiments on real-world datasets demonstrate the superiority of Self-Recover compared to other state-of-the-art forecasting methods.
more » « less
Full Text Available
Fairness-Aware Graph Sampling for Network Analysis

Masrour, Farzan; Santos, Francisco; Tan, Pang-Ning; Esfahanian, Abdol-Hossein (November 2022, Proceedings of the IEEE International Conference on Data Mining)

Network sampling is the task of selecting a subset of nodes and links from a network in a way that preserves its topological properties and other user requirements. This paper investigates the problem of generating an unbiased network sample that contains balanced proportion of nodes from different groups. Creating such a representative sample would require handling the trade-off between ensuring structural preservability and group representativity of the selected nodes. We present a novel max-min subgraph fairness measure that can be used as a unifying framework to combine both criteria. A greedy algorithm is then proposed to generate a fair and representative sample from an initial set of target nodes. A theoretical approximation guarantee for the output of the proposed greedy algorithm based on submodularity and curvature ratios is also presented. Experimental results on real-world datasets show that the proposed method will generate more fair and representative samples compared to other existing network sampling methods.
more » « less
Full Text Available
COMET Flows: Towards Generative Modeling of Multivariate Extremes and Tail Dependence

https://doi.org/10.24963/ijcai.2022/462

McDonald, Andrew; Tan, Pang-Ning; Luo, Lifeng (July 2022, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence)

Normalizing flows—a popular class of deep generative models—often fail to represent extreme phenomena observed in real-world processes. In particular, existing normalizing flow architectures struggle to model multivariate extremes, characterized by heavy-tailed marginal distributions and asymmetric tail dependence among variables. In light of this shortcoming, we propose COMET (COpula Multivariate ExTreme) Flows, which decompose the process of modeling a joint distribution into two parts: (i) modeling its marginal distributions, and (ii) modeling its copula distribution. COMET Flows capture heavy-tailed marginal distributions by combining a parametric tail belief at extreme quantiles of the marginals with an empirical kernel density function at mid-quantiles. In addition, COMET Flows capture asymmetric tail dependence among multivariate extremes by viewing such dependence as inducing a low-dimensional manifold structure in feature space. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of COMET flows in capturing both heavy-tailed marginals and asymmetric tail dependence compared to other state-of-the-art baseline architectures. All code is available at https://github.com/andrewmcdonald27/COMETFlows.
more » « less
Full Text Available
Unsupervised Anomaly Detection by Robust Density Estimation

https://doi.org/10.1609/aaai.v36i4.20328

Liu, Boyang; Tan, Pang-Ning; Zhou, Jiayu (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Density estimation is a widely used method to perform unsupervised anomaly detection. By learning the density function, data points with relatively low densities are classified as anomalies. Unfortunately, the presence of anomalies in training data may significantly impact the density estimation process, thereby imposing significant challenges to the use of more sophisticated density estimation methods such as those based on deep neural networks. In this work, we propose RobustRealNVP, a deep density estimation framework that enhances the robustness of flow-based density estimation methods, enabling their application to unsupervised anomaly detection. RobustRealNVP differs from existing flow-based models from two perspectives. First, RobustRealNVP discards data points with low estimated densities during optimization to prevent them from corrupting the density estimation process. Furthermore, it imposes Lipschitz regularization to the flow-based model to enforce smoothness in the estimated density function. We demonstrate the robustness of our algorithm against anomalies in training data from both theoretical and empirical perspectives. The results show that our algorithm achieves competitive results as compared to state-of-the-art unsupervised anomaly detection methods.
more » « less
Full Text Available
DeepGPD: A Deep Learning Approach for Modeling Geospatio-Temporal Extreme Events

https://doi.org/10.1609/aaai.v36i4.20344

Wilson, Tyler; Tan, Pang-Ning; Luo, Lifeng (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Geospatio-temporal data are pervasive across numerous application domains.These rich datasets can be harnessed to predict extreme events such as disease outbreaks, flooding, crime spikes, etc.However, since the extreme events are rare, predicting them is a hard problem. Statistical methods based on extreme value theory provide a systematic way for modeling the distribution of extreme values. In particular, the generalized Pareto distribution (GPD) is useful for modeling the distribution of excess values above a certain threshold. However, applying such methods to large-scale geospatio-temporal data is a challenge due to the difficulty in capturing the complex spatial relationships between extreme events at multiple locations. This paper presents a deep learning framework for long-term prediction of the distribution of extreme values at different locations. We highlight its computational challenges and present a novel framework that combines convolutional neural networks with deep set and GPD. We demonstrate the effectiveness of our approach on a real-world dataset for modeling extreme climate events.
more » « less
Full Text Available

« Prev Next »

Search for: All records